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We present a novel algorithm which can overcome the drawbacks of the conventional 
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1. INTRODUCTION 

Electronic structure calculations often provide very accurate physical and chemical prop- 
erties of various microscopic systems from first principles, which makes them attractive to 
experimentalists as well as theoreticians. 1-5 However, the computational cost of such calcu- 
lations grows cubically (or faster) with the number of atoms in the system, thereby limiting 
the maximum number of atoms to ~ 10 3 on today's computers. Therefore, much effort has 
been devoted to the development of so-called linear scaling methods, whose computational 
cost grows only linearly with system size, 5-8 usually by making some assumptions about the 
electronic structure of the system. 9 

The emergence of the linear scaling methods has also promoted the development of various 
discretization schemes in real space in the last decade, 10-13 such as finite difference and finite 
element methods. These real space methods are considered more appropriate for linear scaling 
methods than plane waves, because they can easily take advantage of the localization of 
electrons 9 while retaining systematic convergence. Alternatively, the use of atomic basis set 
in linear scaling methods is also an attractive approach. 7 ' 14-16 

From the point of view of computational cost, the orbital minimization method 
(OMM), 17-20 which is designed for nonmetallic systems, is among the most promising lin- 
ear scaling algorithms proposed so far. Moreover, OMM is easy to implement, and is able to 
deal with nonorthogonal basis functions without much difficulty. Therefore, much work has 
been carried out on the implementation of OMM, including first-principles calculations using 
real-space methods. 21-27 Unfortunately, if localization constraints are imposed on the orbitals 
to achieve linear scaling, a naive implementation of OMM suffers from several drawbacks, 6 
which has discouraged the use of OMM in realistic applications to date. In the present paper, 
we propose a simple yet effective algorithm which can overcome the drawbacks of OMM when 
the electronic structure of the system is qualitatively predictable. 
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2. ALGORITHM 

First of all, we briefly describe the basic formalism of electronic structure calculations here. 
For notational simplicity, only non-self-consistent problems are considered here, but extention 
to self-consistent ones is straightforward. Moreover, we assume that the orbitals are real, and 
that there is no spin degeneracy. We also assume the presence of an energy gap between the 
occupied and unoccupied states throughout the paper. Then, the conventional total energy 
functional ©total is given by 

N 

E total [i>] = Y,H tt , (1) 
i=i 

where the Hamiltonian H = — V 2 + V, Ha =< ipi\H\^i >, V is the potential felt by the 
electrons, and N is the number of occupied orbitals. 28 If ©total is minimized with repect to 
the orbitals {i^i{r)}f =1 under the orthonormality constraints 

<^|^>=<%, (2) 

©total and {ipi} will converge to the sum of the iV lowest eigenvalues of Ji and corresponding 
eigenstates, respectively, except for the degrees of freedom associated with any unitary trans- 
formation. This redundancy can be exploited to construct the maximally localized Wannier 
functions (MLWFs), 29 ' 30 whose spread in real space, 

N 

n = Y,(<r 2 > i -<r> 2 t ), 
i=i 

is minimum among all states given by the unitary transformation of the eigenstates. An 
efficient calculation of MLWFs along the trajectory of Car-Parrinello dynamics is also an 
active area of research. 31-34 In the following, MLWFs are denoted by {wi(r)}^L 1 . 

On the other hand, the generalized total energy functional 17 ' 35-37 used in OMM is given 

by 

N 

E tot M=Y,(S~ 1 k-H ij , (3) 

where Hij =<tpi\7i\ipj>, and the overlap matrix S is defined by Sij =<ipi\ipj>. The mini- 
mum value of -©total agrees with that of ©total > and {V'i} that minimize ©total are nonorthogonal 
functions that span the same subspace as the N lowest eigenstates of H. While there are sev- 
eral variants of this functional 18-20 which rely on the Neumann expansion of S~ x , we will 
not go into detail here. In analogy with the case of ©total, ©total is invariant under the linear 
transformation 

|^>=^A^|V/-> (4) 

j 

for any nonsingular matrix X of size N. Therefore, attempts have been made to construct 
nonorthogonal localized orbitals (NOLOs), which can be more localized than MLWFs by 
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taking advantage of the higher degree of freedom. 38- However, special attention has to be 
paid to the risk of falling into linearly dependent states while constructing NOLOs. 

In order to achieve linear scaling with the OMM, localization constraints are imposed 
on the orbitals (Fig.l (a)) during the minimization of -Etotai- 17 When each orbital is strictly 
localized within a given region of space, called the localization region (LR), S and H would 
be sparse matrices, and the computational cost of evaluating each nonzero element of S and 
H would be independent of system size. 41 Therefore, -Etotai as well as its gradient can be 
calculated with linear scaling in a straightforward manner. 42 Moreover, the optimized orbitals 
are expected to be good approximations to MLWFs, which are least likely to be influenced by 
the localization constraints among the unitary transformation of the ground state. Therefore, 
the centers of LRs are usually chosen as close to those of MLWFs as possible. 

Unfortunately, in the presence of localization constraints, iterative minimization of the 
total energy becomes extremely difficult, 6 often requiring hundreds or thousands of iterations 
to converge. Furthermore, the orbitals can be trapped at local minima during the minimization 
process, 6 ' 19,27 which results in poor conservation of the total energy in molecular-dynamics 
simulations. The major source of these problems is that -Etotai has a pathological shape around 
the minimum, which arises from the fact that -Etotai is only approximately invariant under the 
linear transformation of Eq.(4) in the presence of localization constraints. 6,43 

While much effort has been made to overcome these problems, 6,19,44 the performance 
and reliability of OMM under the localization constraints still appear to be insufficient for 
routine use in realistic applications. In the following, we present a simple prescription to make 
OMM a practical linear scaling algorithm with minimal computational overhead. To this end, 
we introduce here the concept of kernel region (KR), which plays an important role in the 
algorithm explained below. For simplicity, we assume that only one orbital is assigned to each 
LR, but extension to the multi-orbital case is straightforward. Then, KRs of a given system 
are generated under the following conditions: 

(a) Each LR includes its own KR, which preferably includes the center of a MLWF. 

(b) There is no overlap between any two KRs. 

(c) No partial overlap between any LR and KR is allowed. 

An example of a set of LRs and KRs that satisfy these conditions is shown in Fig.l (b). In 
practice, we first generate LRs and KRs temporarily, e.g. by the distance criterion, that satisfy 
conditions (a) and (b). Then, if more(less) than a given fraction (say, 40 %) of each KR is 
included in some other LR, the border of that LR is modified to include (exclude) the KR 
completely, thus satisfying condition (c). Since KRs are usually much smaller than LRs, these 
modifications will not have a significant impact on the shape of LRs. In the following, LRs 
and KRs that satisfy the above conditions are denoted by {Li]f =1 and {Ki}f =l , respectively. 
We now define the kernel functions {xi{ r )}f=i that have the following properties: 
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(i) Xi( r ) approximates Wi(r) when r £ K^. 

(ii) Xi{ r ) = when r £ K{. 
(hi) <Xi \Xi>= !• 

Therefore, <Xi\Xj>= <% is satisfied automatically. 

In the augmented orbital minimization method (AOMM), -E t otai is minimized with respect 
to the localized orbitals {tpi} under the additional constraints that 

<Xj\A>=0 (5) 

for any j ^ i, where i,j = 1, 2, • • • N. The role of these constraints is to orthogonalize ipi(r) 
approximately to Wj(r) for any j ^ i, in the hope that V'ifV) will be a good approximation to 
Wi(r) at the minimum. Eq.(5) is satisfied by an explicit orthogonalization as 

| 4 >= Pi\ Y>; >= | > - I *J I ^ >' ( 6 ) 

where the projection operator is given by 

Pi = I-Y J \X3><Xj\- (7) 

Summation with respect to j should be taken only if Kj £ Li, because < Xj I >= otherwise. 
Therefore, the computational cost of projection is relatively minor, scaling only linearly with 
system size. Note that if 4>i(r) is localized within Lj, so is ^[(r) due to the properties of KRs 
and kernel functions. Moreover, each orbital remains unchanged inside its own KR after the 
projection, i.e. ^[(r) = ipi(r) if r G 

There is no unique way to define the kernel functions for given KRs, but if those KRs 
are used as the LRs in the conventional OMM, the optimized orbitals will serve as the kernel 
functions. These are called static kernel functions, since they do not change during the elec- 
tronic structure calculations. Note that there is no slow convergence or local minima problem 
when the LRs do not overlap. An alternative way to define the kernel functions is to use a 
mask function mj(r), such that mj(r) = 1 when r € K; L and wij(r) = otherwise. 45 Then, 
if the orbitals are reasonably close to the ground state, {rrii{r)i)i{r)}f =1 can be used as the 
kernel functions after normalization. We call them dynamic kernel functions, because they are 
updated at every step of the minimization. 

When the kernel functions do not depend on the orbitals, the gradient of £totai under the 
constraints of Eq. (5) is given by 

I - d ^total (Q \ 

\*>=Pi-Qj-i (8) 
which can also be evaluated with linear scaling effort. If dynamic kernel functions are used, 
a correction term is required to take into account the dependence of {xi} on {tpi}, which, 
however, can be calculated in a straightforward manner. Fig.2 shows the flow chart of the 
electronic structure calculation for a given ionic configuration in the AOMM. 
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3. RESULTS 

The performance of our algorithm is first evaluated in a simple one-dimensional problem. 
We consider a system consisting of 5 electrons in the potential wells shown in Fig. 3, where 
x = 0, 1, ■ ■ ■ , 160, and vanishing boundary conditions are imposed on the orbitals. When a 
3-point finite-difference approximation is used for the Laplacian, the Hamiltonian 7i is given 
by a tridiagonal matrix of size 161x161 as 

/ 2 + v -1 \ 
-1 2 + V! -1 



n 



-1 2 + ^159 -1 

\ -1 2 + t;i6o / 



(9) 



We used 5 pairs of LRs and KRs centered at 40, 60, 80, 100, and 120, the radii of which are 
denoted by -R L r and -Rkr, respectively. Therefore, each LR(KR) is given 2_R L r+1 (2i? K R+l) 
degrees of freedom. It is worth noting that in this system the conditions (a)-(c) given in the 
previous section translate into the inequalities as follows: (a) R KR < Rlr, (b) < R KR < 9, 
and (c) 20 - R KR < R hR < 20 + R KR , 40 - R KR < R hR < 40 + R KR , • • • are not allowed. The 
centers of the MLWFs constructed from the 5 lowest eigenstates of H are given by (39.66, 
60.02, 80.03, 99.98, 120.27), which justify our choice of LRs and KRs. We used static kernel 
functions which were calculated in advance, as explained in the previous section. The kernel 
functions which belong to the central KR are compared with the MLWF in Fig. 4. 

The ground state of this system was calculated iteratively by the conjugate gradient 
method 46 with no preconditioning. Ground state calculations were repeated 100 times from 
different random initial states, 47 from which statistics were collected. Each calculation was 
terminated successfully when the total energy difference between two successive steps was 
smaller than 10~ n . If convergence was not achieved after 1000 iterations, the calculation was 
regarded as a failure, which was excluded from the statistics. 

Fig.5 (a) shows the number of unsuccessful calculations as functions of i? LR for the OMM 
and AOMM. For small values of -Rlr, where only a small portion of the neighboring LRs 
overlap, both methods work equally well. In the OMM, however, this number grows rapidly as 
the LRs begin to include the centers of neighboring LRs at R LR « 20, and the iterations almost 
always fail to converge when the second nearest neighbors are also included at Rlr ~ 40. In 
contrast, no failure is observed in the AOMM for all values of -Rlr, which clearly demonstrates 
the advantage of AOMM over the OMM. 

Average number of iterations for OMM (Fig.5 (b)) shows a similar tendency. While the 
convergence rate also slowly deteriorates with -R L r in the AOMM, this problem is easily 
overcome by a suitable preconditioner and/or the multigrid method. 10 



6/?? 



J. Phys. Soc. Jpn. Full Paper 

Fig. 5 (c) shows the relative errors in total energy from the exact value obtained by di- 
agonalization of 7i. For comparison, we also show the values for the MLWFs, which are first 
projected onto the LRs of size -Rlr> followed by smoothing at the boundaries. While the OMM 
gives the fastest convergence with respect to -Rlr> the errors saturate at -Rlr ~ 40, because 
the optimization is always trapped at local minima. In contrast, no saturation is observed in 
the results of AOMM, even if the convergence is slower due to the additional constraints of 
Eq.(5). Overall, AOMM values are very close to those of MLWFs, including the slowdown at 
-Rlr ~ 20 and 40, but converge somewhat faster. Moreover, no local minima were found in 
the AOMM. 

The determinant of the overlap matrix S at the ground state is shown in Fig. 5 (d). While 
AOMM and MLWF behave similarly, the asymptotic value of AOMM 0.986) is slightly 
smaller than that of MLWF (=1). In contrast, OMM values keep decreasing with -Rlr, which 
implies that the orbitals are falling into linearly dependent states. 

Fig. 5 (e) shows the average spread a of the orbitals, where a = x 2 >i ~ < x >f 

)z/N. Although AOMM and MLWF give very similar results, OMM values increase steadily 
with -Rlr, which suggests that the orbitals deviate from the picture of MLWFs at large -Rlr- 

To promote further understanding of this point, the optimized orbitals which belong to 
the central LR are compared with the MLWF in Fig. 6. A prominent feature of the MLWF is 
the oscillatory behavior at large distances from the center, called the orthogonalization tail, 40 
which arises from the orthogonality constraints of Eq.(2). While the orbitals obtained from 
AOMM are very similar to the MLWF, they decay faster at large distances, particularly when 
i?KR is small. In contrast, the orbital from OMM exhibits irregular behavior, as expected from 
the large a mentioned above. 

We have also implemented AOMM in our first-principles code FEMTECK (Finite Element 
Method based Total Energy Calculation Kit) 21 ' 48 to assess its performance under realistic 
conditioins. We have carried out ab initio molecular-dynamics simulations of liquid water at 
ambient conditions using a cubic supercell of side 29.35 Bohr containing 125 molecules. All 
hydrogen atoms in the system were given the mass of deuterium, and a timestep of 40 a.u. 
(~ 0.97 fs) was used in all simulations. We used 125 pairs of LRs and KRs, all of which 
are centered at the oxygen atoms, and 4 orbitals were assigned to each LR and KR. The 
orbitals were optimized using a limited-memory variant of the quasi-Newton method 49-51 
with a tolerance of 2 x 10~ 10 Ry/orbital. Other details of the simulations are described in our 
recent publications. 52 ' 53 Table I shows the details of 4 runs, where dynamic kernel functions 
were used in all AOMM runs. 54 Fig. 7(a) shows the time evolution of the total energy and 
potential energy for extended orbitals, which proves the accuracy of the ionic forces in our 
simulations. Fig. 7(b) and (c) show the total energies and errors in ionic temperature during 
the molecular-dynamics simulations. Ionic forces were calculated under the assumption that 
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all LRs and KRs are fixed in space. In reality, neither assumption is true, which explains the 
irregular behavior of the total energies when i? L R is small. However, conservation of the total 
energy for i? LR = 12 Bohr is already competitive with that of the extended orbitals. The ionic 
temperature in the OMM run is also reproduced with an error of < 1 K when i? LR = 12 Bohr. 
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4. DISCUSSION 

In this section, we make several observations on the properties of AOMM. 

When all LRs are extended, each LR will include all the KRs. In this case, Eq.(5) imposes 
N 2 — N constraints, which is equivalent to the number of degrees of freedom associated with 
Eq.(4) (assuming that each orbital is normalized). Therefore, the ground state of the system 
is uniquely determined (except for sign) with no loss of accuracy, since the redundancy of the 
orbitals is completely removed. If the LRs have a finite size smaller than the unit cell, -Etotai 
is no longer invariant under the transformation of Eq.(4). Nevertheless, if a large portion of 
two LRs overlap with each other, -Etotai would be nearly invariant under the mixing of two 
orbitals which belong to these LRs. This near-redundancy is considered the major source 
of slow convergence and local minima problem. 6 ' 43 If these LRs are denoted by L\ and L2, 
we can expect that K\ € L2 and K2 £ L±, since the KRs are located near the centers of 
LRs. Then, Eq.(5) gives two constraints on these orbitals, which can eliminate the near- 
redundancy associated with L\ and L^- On the other hand, if only a small portion of L\ and 
L2 overlap, they do not cause any problems, as shown in the previous section. Therefore, the 
above observation for the extended orbitals remains essentially valid even if the orbitals are 
localized. 

In the limiting case of large (yet nonoverlapping) KRs, the static kernel functions will 
be rather good approximations to the MLWFs. If the kernel functions are regarded as the 
zeroth-order approximation to the ground state, the orbitals can be written as follows: 



which is in close analogy with the case of perturbation theory. 55 ' 56 Note, however, that our 
calculations are fully self-consistent. On the other hand, if the precise positions of MLWF 
centers are known a priori, e.g. in perfect crystals, the KRs can be chosen infinitesimally 
small, in which case each kernel function would be a 5- function. Then, Eq.(5) reduces to 
tpi(rj) = 0, where rj denotes the position of Kj. Since we can expect that Y>j(rj) 7^ for 
any i, these constraints will guarantee the linear independence of the orbitals. While it may 
seem counterintuitive, the total energy is systematically lower for smaller KRs, if each KR 
is chosen appropriately. This is explained as follows. When KRs are large, the ground state 
orbitals resemble the conventional MLWFs, which suffer from large orthogonalization tails. 
As the KRs become smaller, the influence of Eq.(5) becomes more local, thus reducing the 
orthogonalization tails of the orbitals. Therefore, from the point of view of minimizing the 
errors in total energy for given LRs, the KRs should be chosen as small as possible. 



ipi>= \ xi> +!<%> . 



(10) 



Then, the constraints of Eq.(5) would be equivalent to 



<Xj\fyi>= °> 



(11) 
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So far, we have implicitly assumed that the positions of MLWFs centers, which are re- 
quired for the determination of KRs and LRs, are known a priori with sufficient accuracy. 
Fortunately, in many systems with large energy gaps, e.g. in liquid water, the electrons form a 
closed shell. Then, approximate positions of MLWF centers are available based solely on the 
knowledge of chemistry. If, however, part of the system consists of complex atomic configu- 
rations with unknown electronic structures, it would be difficult to choose the KRs and LRs 
appropriately. One possible solution to this problem is the implementation of the adaptive 
localization centers, 26,27 which gives approximate positions of MLWF centers without a priori 
knowledge of the system. An alternative approach is to use extended LRs for the orbitals, the 
behavior of which is unpredictable. This problem will be discussed in more detail in future 
publications. 
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5. CONCLUSION 

In this article, we have shown that the linear scaling method based on OMM can be as 
robust as the conventional algorithm using extended orbitals, when augmented with additional 
constraints to guarantee linear independence of the orbitals. Although it is difficult to give a 
general proof, AOMM appears to overcome the slow convergence and local minima problem 
of OMM, provided that LRs, KRs, and kernel functions are chosen appropriately. A more 
detailed study on the performance of AOMM in molecular-dynamics simulations is under 
way, and will be reported in a forthcoming paper. 
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Table I. Simulation details for each run. M and AE denote the average number of iterations and the 
error in total energy for the initial configuration, respectively. 





Method 


R hR (Bohr) 


R KR (Bohr) 


M 


AE (Ry) 


(a) 


OMM 


oo 




12.0 





(b) 


AOMM 


8 


0.8 


14.1 


0.13456 


(c) 


AOMM 


10 


0.8 


14.8 


0.02040 


(d) 


AOMM 


12 


0.8 


14.9 


0.00309 



14/?? 



Phys. Soc. Jpn. 



Full Paper 





1. (a) Conventional definition of the localization regions in OMM (solid lines). Filled circles 
denote the centers of localization, which are usually either atomic positions or bond centers, (b) 
Localization regions in AOMM. Kernel regions are shown with dashed lines. 
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Fig. 2. Flow chart of AOMM for a given ionic configuration. The loop is repeated until a convergence 
criterion is satisfied. Update of x can be skipped if static kernel functions are used. 
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Fig. 3. Five square potential wells of widths 9 are centered at 40,60,80,100, and 120. 
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Fig. 4. MLWF and kernel functions which belong to the central KR. 
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Fig. 5. (a) Number of unsuccessful calculations, (b) Average number of iterations, (c) Relative errors 
in total energy, (d) Determinant of the overlap matrix, (e) Average spread of the orbitals. 
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Fig. 6. (a) Localized orbitals which belong to the central LR (i?LR = 50), obtained from OMM and 
AOMM (i? K R=2 and 9)). (b) Enlarged view of (a). 
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Fig. 7. (a) Time evolution of the total energy and the ionic potential energy when all orbitals are 
extended, (b) Conservation of the total energy during the simulations. Total energy of the initial 
configuration is chosen as the origin for e3e'li??un. (c) Errors in ionic temperature during the 



